Overview

Dataset statistics

Number of variables27
Number of observations899164
Missing cells751259
Missing cells (%)3.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory185.2 MiB
Average record size in memory216.0 B

Variable types

Numeric12
Categorical14
DateTime1

Alerts

Name has a high cardinality: 779583 distinct values High cardinality
City has a high cardinality: 32581 distinct values High cardinality
State has a high cardinality: 51 distinct values High cardinality
Bank has a high cardinality: 5802 distinct values High cardinality
BankState has a high cardinality: 56 distinct values High cardinality
ApprovalDate has a high cardinality: 9859 distinct values High cardinality
ApprovalFY has a high cardinality: 52 distinct values High cardinality
ChgOffDate has a high cardinality: 6448 distinct values High cardinality
Term is highly correlated with GrAppv and 1 other fieldsHigh correlation
CreateJob is highly correlated with RetainedJobHigh correlation
RetainedJob is highly correlated with CreateJobHigh correlation
DisbursementGross is highly correlated with GrAppv and 1 other fieldsHigh correlation
GrAppv is highly correlated with Term and 2 other fieldsHigh correlation
SBA_Appv is highly correlated with Term and 2 other fieldsHigh correlation
Term is highly correlated with DisbursementGross and 2 other fieldsHigh correlation
DisbursementGross is highly correlated with Term and 2 other fieldsHigh correlation
GrAppv is highly correlated with Term and 2 other fieldsHigh correlation
SBA_Appv is highly correlated with Term and 2 other fieldsHigh correlation
LoanNr_ChkDgt is highly correlated with ApprovalFYHigh correlation
State is highly correlated with Zip and 1 other fieldsHigh correlation
Zip is highly correlated with State and 1 other fieldsHigh correlation
BankState is highly correlated with State and 1 other fieldsHigh correlation
NAICS is highly correlated with ApprovalFY and 1 other fieldsHigh correlation
ApprovalFY is highly correlated with LoanNr_ChkDgt and 4 other fieldsHigh correlation
Term is highly correlated with MIS_StatusHigh correlation
CreateJob is highly correlated with ApprovalFY and 1 other fieldsHigh correlation
RetainedJob is highly correlated with CreateJobHigh correlation
UrbanRural is highly correlated with NAICS and 2 other fieldsHigh correlation
RevLineCr is highly correlated with ApprovalFY and 1 other fieldsHigh correlation
DisbursementGross is highly correlated with GrAppv and 1 other fieldsHigh correlation
MIS_Status is highly correlated with TermHigh correlation
GrAppv is highly correlated with DisbursementGross and 1 other fieldsHigh correlation
SBA_Appv is highly correlated with DisbursementGross and 1 other fieldsHigh correlation
State is highly correlated with BankStateHigh correlation
UrbanRural is highly correlated with ApprovalFYHigh correlation
BankState is highly correlated with StateHigh correlation
ApprovalFY is highly correlated with UrbanRuralHigh correlation
ChgOffDate has 736465 (81.9%) missing values Missing
NoEmp is highly skewed (γ1 = 80.24824355) Skewed
CreateJob is highly skewed (γ1 = 36.99135473) Skewed
RetainedJob is highly skewed (γ1 = 36.85481184) Skewed
LoanNr_ChkDgt has unique values Unique
NAICS has 201948 (22.5%) zeros Zeros
CreateJob has 629248 (70.0%) zeros Zeros
RetainedJob has 440403 (49.0%) zeros Zeros
FranchiseCode has 208835 (23.2%) zeros Zeros
ChgOffPrinGr has 737152 (82.0%) zeros Zeros

Reproduction

Analysis started2022-06-22 00:59:41.766961
Analysis finished2022-06-22 01:02:22.207278
Duration2 minutes and 40.44 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

LoanNr_ChkDgt
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct899164
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4772612311
Minimum1000014003
Maximum9996003010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:22.239600image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum1000014003
5-th percentile1348457210
Q12589757508
median4361439006
Q36904626505
95-th percentile9164803856
Maximum9996003010
Range8995989007
Interquartile range (IQR)4314868996

Descriptive statistics

Standard deviation2538175037
Coefficient of variation (CV)0.5318209132
Kurtosis-1.086498977
Mean4772612311
Median Absolute Deviation (MAD)2013400000
Skewness0.364757102
Sum4.291361176 × 1015
Variance6.442332521 × 1018
MonotonicityStrictly increasing
2022-06-22T02:02:22.297967image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10000140031
 
< 0.1%
59449840071
 
< 0.1%
59448740091
 
< 0.1%
59448840011
 
< 0.1%
59449040051
 
< 0.1%
59449140081
 
< 0.1%
59449240001
 
< 0.1%
59449340031
 
< 0.1%
59449440061
 
< 0.1%
59449540091
 
< 0.1%
Other values (899154)899154
> 99.9%
ValueCountFrequency (%)
10000140031
< 0.1%
10000240061
< 0.1%
10000340091
< 0.1%
10000440011
< 0.1%
10000540041
< 0.1%
10000840021
< 0.1%
10000930091
< 0.1%
10000940051
< 0.1%
10001040061
< 0.1%
10001240011
< 0.1%
ValueCountFrequency (%)
99960030101
< 0.1%
99959730061
< 0.1%
99956130031
< 0.1%
99956030001
< 0.1%
99955730041
< 0.1%
99955630011
< 0.1%
99954930041
< 0.1%
99954730091
< 0.1%
99954530031
< 0.1%
99954230051
< 0.1%

Name
Categorical

HIGH CARDINALITY

Distinct779583
Distinct (%)86.7%
Missing14
Missing (%)< 0.1%
Memory size6.9 MiB
SUBWAY
 
1269
QUIZNO'S SUBS
 
433
COLD STONE CREAMERY
 
366
QUIZNO'S
 
345
DOMINO'S PIZZA
 
329
Other values (779578)
896408 

Length

Max length30
Median length23
Mean length21.77596285
Min length1

Characters and Unicode

Total characters19579857
Distinct characters91
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique706468 ?
Unique (%)78.6%

Sample

1st rowABC HOBBYCRAFT
2nd rowLANDMARK BAR & GRILLE (THE)
3rd rowWHITLOCK DDS, TODD M.
4th rowBIG BUCKS PAWN & JEWELRY, LLC
5th rowANASTASIA CONFECTIONS, INC.

Common Values

ValueCountFrequency (%)
SUBWAY1269
 
0.1%
QUIZNO'S SUBS433
 
< 0.1%
COLD STONE CREAMERY366
 
< 0.1%
QUIZNO'S345
 
< 0.1%
DOMINO'S PIZZA329
 
< 0.1%
DAIRY QUEEN328
 
< 0.1%
THE UPS STORE323
 
< 0.1%
DUNKIN DONUTS299
 
< 0.1%
MATCO TOOLS288
 
< 0.1%
MAIL BOXES ETC280
 
< 0.1%
Other values (779573)894890
99.5%

Length

2022-06-22T02:02:22.376389image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
inc263379
 
8.4%
100280
 
3.2%
llc77826
 
2.5%
and28959
 
0.9%
the28389
 
0.9%
of23026
 
0.7%
dba20214
 
0.6%
co18216
 
0.6%
a18114
 
0.6%
services17318
 
0.6%
Other values (226643)2530176
80.9%

Most occurring characters

ValueCountFrequency (%)
2231639
 
11.4%
E1354056
 
6.9%
I1226719
 
6.3%
A1177821
 
6.0%
N1170319
 
6.0%
R1052562
 
5.4%
C1038114
 
5.3%
S1009495
 
5.2%
O933206
 
4.8%
T917437
 
4.7%
Other values (81)7468489
38.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter14311292
73.1%
Lowercase Letter2249775
 
11.5%
Space Separator2231639
 
11.4%
Other Punctuation712203
 
3.6%
Decimal Number38461
 
0.2%
Dash Punctuation29147
 
0.1%
Open Punctuation3600
 
< 0.1%
Close Punctuation2973
 
< 0.1%
Math Symbol498
 
< 0.1%
Currency Symbol198
 
< 0.1%
Other values (2)71
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
E1354056
 
9.5%
I1226719
 
8.6%
A1177821
 
8.2%
N1170319
 
8.2%
R1052562
 
7.4%
C1038114
 
7.3%
S1009495
 
7.1%
O933206
 
6.5%
T917437
 
6.4%
L840208
 
5.9%
Other values (16)3591355
25.1%
Lowercase Letter
ValueCountFrequency (%)
e250402
11.1%
n238175
10.6%
a206694
9.2%
r187739
 
8.3%
i180961
 
8.0%
o178702
 
7.9%
t151259
 
6.7%
s141102
 
6.3%
c123850
 
5.5%
l107780
 
4.8%
Other values (16)483111
21.5%
Other Punctuation
ValueCountFrequency (%)
.273453
38.4%
,244641
34.3%
&104166
 
14.6%
'73757
 
10.4%
/10119
 
1.4%
#3514
 
0.5%
"906
 
0.1%
!473
 
0.1%
:411
 
0.1%
*244
 
< 0.1%
Other values (5)519
 
0.1%
Decimal Number
ValueCountFrequency (%)
17572
19.7%
26295
16.4%
04730
12.3%
33993
10.4%
43678
9.6%
52715
 
7.1%
82585
 
6.7%
62467
 
6.4%
72234
 
5.8%
92192
 
5.7%
Math Symbol
ValueCountFrequency (%)
+468
94.0%
=16
 
3.2%
>9
 
1.8%
<5
 
1.0%
Open Punctuation
ValueCountFrequency (%)
(3597
99.9%
[3
 
0.1%
Close Punctuation
ValueCountFrequency (%)
)2972
> 99.9%
]1
 
< 0.1%
Modifier Symbol
ValueCountFrequency (%)
`64
94.1%
^4
 
5.9%
Space Separator
ValueCountFrequency (%)
2231639
100.0%
Dash Punctuation
ValueCountFrequency (%)
-29147
100.0%
Currency Symbol
ValueCountFrequency (%)
$198
100.0%
Connector Punctuation
ValueCountFrequency (%)
_3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin16561067
84.6%
Common3018790
 
15.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
E1354056
 
8.2%
I1226719
 
7.4%
A1177821
 
7.1%
N1170319
 
7.1%
R1052562
 
6.4%
C1038114
 
6.3%
S1009495
 
6.1%
O933206
 
5.6%
T917437
 
5.5%
L840208
 
5.1%
Other values (42)5841130
35.3%
Common
ValueCountFrequency (%)
2231639
73.9%
.273453
 
9.1%
,244641
 
8.1%
&104166
 
3.5%
'73757
 
2.4%
-29147
 
1.0%
/10119
 
0.3%
17572
 
0.3%
26295
 
0.2%
04730
 
0.2%
Other values (29)33271
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII19579857
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2231639
 
11.4%
E1354056
 
6.9%
I1226719
 
6.3%
A1177821
 
6.0%
N1170319
 
6.0%
R1052562
 
5.4%
C1038114
 
5.3%
S1009495
 
5.2%
O933206
 
4.8%
T917437
 
4.7%
Other values (81)7468489
38.1%

City
Categorical

HIGH CARDINALITY

Distinct32581
Distinct (%)3.6%
Missing30
Missing (%)< 0.1%
Memory size6.9 MiB
LOS ANGELES
 
11558
HOUSTON
 
10247
NEW YORK
 
7846
CHICAGO
 
6036
MIAMI
 
5594
Other values (32576)
857853 

Length

Max length30
Median length27
Mean length9.103062502
Min length1

Characters and Unicode

Total characters8184873
Distinct characters80
Distinct categories9 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12872 ?
Unique (%)1.4%

Sample

1st rowEVANSVILLE
2nd rowNEW PARIS
3rd rowBLOOMINGTON
4th rowBROKEN ARROW
5th rowORLANDO

Common Values

ValueCountFrequency (%)
LOS ANGELES11558
 
1.3%
HOUSTON10247
 
1.1%
NEW YORK7846
 
0.9%
CHICAGO6036
 
0.7%
MIAMI5594
 
0.6%
SAN DIEGO5363
 
0.6%
DALLAS5085
 
0.6%
PHOENIX4493
 
0.5%
LAS VEGAS4390
 
0.5%
SPRINGFIELD3738
 
0.4%
Other values (32571)834784
92.8%

Length

2022-06-22T02:02:22.441895image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city23831
 
2.0%
san21942
 
1.8%
new16075
 
1.3%
los13000
 
1.1%
angeles12380
 
1.0%
lake10729
 
0.9%
houston10587
 
0.9%
beach10462
 
0.9%
park10316
 
0.9%
york9724
 
0.8%
Other values (17695)1066583
88.5%

Most occurring characters

ValueCountFrequency (%)
A744405
 
9.1%
E723098
 
8.8%
O632510
 
7.7%
N621338
 
7.6%
L573578
 
7.0%
R513614
 
6.3%
S475392
 
5.8%
I468344
 
5.7%
T425108
 
5.2%
306936
 
3.8%
Other values (70)2700550
33.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter7442897
90.9%
Lowercase Letter398062
 
4.9%
Space Separator306936
 
3.8%
Open Punctuation14884
 
0.2%
Other Punctuation11120
 
0.1%
Close Punctuation9119
 
0.1%
Dash Punctuation946
 
< 0.1%
Decimal Number870
 
< 0.1%
Modifier Symbol39
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A744405
 
10.0%
E723098
 
9.7%
O632510
 
8.5%
N621338
 
8.3%
L573578
 
7.7%
R513614
 
6.9%
S475392
 
6.4%
I468344
 
6.3%
T425108
 
5.7%
C262549
 
3.5%
Other values (16)2002961
26.9%
Lowercase Letter
ValueCountFrequency (%)
e43411
10.9%
a41550
10.4%
n36545
9.2%
o36384
9.1%
l32699
 
8.2%
i30470
 
7.7%
r29637
 
7.4%
t24529
 
6.2%
s21884
 
5.5%
d12360
 
3.1%
Other values (16)88593
22.3%
Other Punctuation
ValueCountFrequency (%)
.8672
78.0%
,1215
 
10.9%
'1134
 
10.2%
:29
 
0.3%
&22
 
0.2%
/21
 
0.2%
;18
 
0.2%
#5
 
< 0.1%
@2
 
< 0.1%
*1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0153
17.6%
1145
16.7%
2113
13.0%
590
10.3%
486
9.9%
378
9.0%
663
7.2%
951
 
5.9%
849
 
5.6%
742
 
4.8%
Open Punctuation
ValueCountFrequency (%)
(14879
> 99.9%
[5
 
< 0.1%
Modifier Symbol
ValueCountFrequency (%)
`38
97.4%
^1
 
2.6%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Close Punctuation
ValueCountFrequency (%)
)9119
100.0%
Dash Punctuation
ValueCountFrequency (%)
-946
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin7840959
95.8%
Common343914
 
4.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
A744405
 
9.5%
E723098
 
9.2%
O632510
 
8.1%
N621338
 
7.9%
L573578
 
7.3%
R513614
 
6.6%
S475392
 
6.1%
I468344
 
6.0%
T425108
 
5.4%
C262549
 
3.3%
Other values (42)2401023
30.6%
Common
ValueCountFrequency (%)
306936
89.2%
(14879
 
4.3%
)9119
 
2.7%
.8672
 
2.5%
,1215
 
0.4%
'1134
 
0.3%
-946
 
0.3%
0153
 
< 0.1%
1145
 
< 0.1%
2113
 
< 0.1%
Other values (18)602
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII8184873
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A744405
 
9.1%
E723098
 
8.8%
O632510
 
7.7%
N621338
 
7.6%
L573578
 
7.0%
R513614
 
6.3%
S475392
 
5.8%
I468344
 
5.7%
T425108
 
5.2%
306936
 
3.8%
Other values (70)2700550
33.0%

State
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct51
Distinct (%)< 0.1%
Missing14
Missing (%)< 0.1%
Memory size6.9 MiB
CA
130619 
TX
70458 
NY
57693 
FL
 
41212
PA
 
35170
Other values (46)
563998 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1798300
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowIN
2nd rowIN
3rd rowIN
4th rowOK
5th rowFL

Common Values

ValueCountFrequency (%)
CA130619
 
14.5%
TX70458
 
7.8%
NY57693
 
6.4%
FL41212
 
4.6%
PA35170
 
3.9%
OH32622
 
3.6%
IL29669
 
3.3%
MA25272
 
2.8%
MN24373
 
2.7%
NJ24035
 
2.7%
Other values (41)428027
47.6%

Length

2022-06-22T02:02:22.498500image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca130619
 
14.5%
tx70458
 
7.8%
ny57693
 
6.4%
fl41212
 
4.6%
pa35170
 
3.9%
oh32622
 
3.6%
il29669
 
3.3%
ma25272
 
2.8%
mn24373
 
2.7%
nj24035
 
2.7%
Other values (41)428027
47.6%

Most occurring characters

ValueCountFrequency (%)
A306176
17.0%
C184957
10.3%
N181727
10.1%
M132549
 
7.4%
T125069
 
7.0%
I119518
 
6.6%
O94906
 
5.3%
L88819
 
4.9%
X70458
 
3.9%
Y68255
 
3.8%
Other values (14)425866
23.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1798300
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A306176
17.0%
C184957
10.3%
N181727
10.1%
M132549
 
7.4%
T125069
 
7.0%
I119518
 
6.6%
O94906
 
5.3%
L88819
 
4.9%
X70458
 
3.9%
Y68255
 
3.8%
Other values (14)425866
23.7%

Most occurring scripts

ValueCountFrequency (%)
Latin1798300
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A306176
17.0%
C184957
10.3%
N181727
10.1%
M132549
 
7.4%
T125069
 
7.0%
I119518
 
6.6%
O94906
 
5.3%
L88819
 
4.9%
X70458
 
3.9%
Y68255
 
3.8%
Other values (14)425866
23.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII1798300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A306176
17.0%
C184957
10.3%
N181727
10.1%
M132549
 
7.4%
T125069
 
7.0%
I119518
 
6.6%
O94906
 
5.3%
L88819
 
4.9%
X70458
 
3.9%
Y68255
 
3.8%
Other values (14)425866
23.7%

Zip
Real number (ℝ≥0)

HIGH CORRELATION

Distinct33611
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean53804.39124
Minimum0
Maximum99999
Zeros283
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:22.549159image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3838
Q127587
median55410
Q383704
95-th percentile95822
Maximum99999
Range99999
Interquartile range (IQR)56117

Descriptive statistics

Standard deviation31184.15915
Coefficient of variation (CV)0.5795839044
Kurtosis-1.335989332
Mean53804.39124
Median Absolute Deviation (MAD)28206
Skewness-0.1681666308
Sum4.837897165 × 1010
Variance972451782
MonotonicityNot monotonic
2022-06-22T02:02:22.607501image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10001933
 
0.1%
90015926
 
0.1%
93401806
 
0.1%
90010733
 
0.1%
33166671
 
0.1%
90021666
 
0.1%
59601640
 
0.1%
65804599
 
0.1%
3801581
 
0.1%
59101578
 
0.1%
Other values (33601)892031
99.2%
ValueCountFrequency (%)
0283
< 0.1%
124
 
< 0.1%
211
 
< 0.1%
35
 
< 0.1%
45
 
< 0.1%
55
 
< 0.1%
64
 
< 0.1%
76
 
< 0.1%
815
 
< 0.1%
924
 
< 0.1%
ValueCountFrequency (%)
99999209
< 0.1%
999503
 
< 0.1%
9992915
 
< 0.1%
999281
 
< 0.1%
999261
 
< 0.1%
999254
 
< 0.1%
999231
 
< 0.1%
9992113
 
< 0.1%
999192
 
< 0.1%
999181
 
< 0.1%

Bank
Categorical

HIGH CARDINALITY

Distinct5802
Distinct (%)0.6%
Missing1559
Missing (%)0.2%
Memory size6.9 MiB
BANK OF AMERICA NATL ASSOC
86853 
WELLS FARGO BANK NATL ASSOC
63503 
JPMORGAN CHASE BANK NATL ASSOC
 
48167
U.S. BANK NATIONAL ASSOCIATION
 
35143
CITIZENS BANK NATL ASSOC
 
35054
Other values (5797)
628885 

Length

Max length30
Median length26
Mean length23.1879457
Min length3

Characters and Unicode

Total characters20813616
Distinct characters50
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique923 ?
Unique (%)0.1%

Sample

1st rowFIFTH THIRD BANK
2nd row1ST SOURCE BANK
3rd rowGRANT COUNTY STATE BANK
4th row1ST NATL BK & TR CO OF BROKEN
5th rowFLORIDA BUS. DEVEL CORP

Common Values

ValueCountFrequency (%)
BANK OF AMERICA NATL ASSOC86853
 
9.7%
WELLS FARGO BANK NATL ASSOC63503
 
7.1%
JPMORGAN CHASE BANK NATL ASSOC48167
 
5.4%
U.S. BANK NATIONAL ASSOCIATION35143
 
3.9%
CITIZENS BANK NATL ASSOC35054
 
3.9%
PNC BANK, NATIONAL ASSOCIATION27351
 
3.0%
BBCN BANK22978
 
2.6%
CAPITAL ONE NATL ASSOC22248
 
2.5%
MANUFACTURERS & TRADERS TR CO11265
 
1.3%
READYCAP LENDING, LLC10664
 
1.2%
Other values (5792)534379
59.4%

Length

2022-06-22T02:02:22.675543image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bank651608
18.5%
natl318240
 
9.0%
assoc306768
 
8.7%
of142852
 
4.1%
national125899
 
3.6%
america100686
 
2.9%
association84965
 
2.4%
fargo63732
 
1.8%
wells63650
 
1.8%
52264
 
1.5%
Other values (3602)1606709
45.7%

Most occurring characters

ValueCountFrequency (%)
A2762231
13.3%
2620014
12.6%
N2105500
10.1%
S1520499
 
7.3%
O1336993
 
6.4%
T1181841
 
5.7%
C1134642
 
5.5%
I1061717
 
5.1%
E923739
 
4.4%
L922583
 
4.4%
Other values (40)5243857
25.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter17830764
85.7%
Space Separator2620014
 
12.6%
Other Punctuation341354
 
1.6%
Dash Punctuation10861
 
0.1%
Decimal Number9482
 
< 0.1%
Open Punctuation584
 
< 0.1%
Close Punctuation555
 
< 0.1%
Math Symbol2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A2762231
15.5%
N2105500
11.8%
S1520499
 
8.5%
O1336993
 
7.5%
T1181841
 
6.6%
C1134642
 
6.4%
I1061717
 
6.0%
E923739
 
5.2%
L922583
 
5.2%
B893994
 
5.0%
Other values (16)3987025
22.4%
Decimal Number
ValueCountFrequency (%)
15538
58.4%
51268
 
13.4%
01258
 
13.3%
41222
 
12.9%
2112
 
1.2%
733
 
0.3%
324
 
0.3%
917
 
0.2%
87
 
0.1%
63
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
.192998
56.5%
,94677
27.7%
&50021
 
14.7%
/1833
 
0.5%
'1811
 
0.5%
:10
 
< 0.1%
#2
 
< 0.1%
*1
 
< 0.1%
%1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2620014
100.0%
Dash Punctuation
ValueCountFrequency (%)
-10861
100.0%
Open Punctuation
ValueCountFrequency (%)
(584
100.0%
Close Punctuation
ValueCountFrequency (%)
)555
100.0%
Math Symbol
ValueCountFrequency (%)
+2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin17830764
85.7%
Common2982852
 
14.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A2762231
15.5%
N2105500
11.8%
S1520499
 
8.5%
O1336993
 
7.5%
T1181841
 
6.6%
C1134642
 
6.4%
I1061717
 
6.0%
E923739
 
5.2%
L922583
 
5.2%
B893994
 
5.0%
Other values (16)3987025
22.4%
Common
ValueCountFrequency (%)
2620014
87.8%
.192998
 
6.5%
,94677
 
3.2%
&50021
 
1.7%
-10861
 
0.4%
15538
 
0.2%
/1833
 
0.1%
'1811
 
0.1%
51268
 
< 0.1%
01258
 
< 0.1%
Other values (14)2573
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII20813616
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A2762231
13.3%
2620014
12.6%
N2105500
10.1%
S1520499
 
7.3%
O1336993
 
6.4%
T1181841
 
5.7%
C1134642
 
5.5%
I1061717
 
5.1%
E923739
 
4.4%
L922583
 
4.4%
Other values (40)5243857
25.2%

BankState
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct56
Distinct (%)< 0.1%
Missing1566
Missing (%)0.2%
Memory size6.9 MiB
CA
118116 
NC
79514 
IL
65908 
OH
58461 
SD
 
51095
Other values (51)
524504 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters1795196
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowOH
2nd rowIN
3rd rowIN
4th rowOK
5th rowFL

Common Values

ValueCountFrequency (%)
CA118116
 
13.1%
NC79514
 
8.8%
IL65908
 
7.3%
OH58461
 
6.5%
SD51095
 
5.7%
TX47790
 
5.3%
RI45366
 
5.0%
NY39592
 
4.4%
VA29002
 
3.2%
DE24537
 
2.7%
Other values (46)338217
37.6%

Length

2022-06-22T02:02:22.729220image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ca118116
 
13.2%
nc79514
 
8.9%
il65908
 
7.3%
oh58461
 
6.5%
sd51095
 
5.7%
tx47790
 
5.3%
ri45366
 
5.1%
ny39592
 
4.4%
va29002
 
3.2%
de24537
 
2.7%
Other values (46)338217
37.7%

Most occurring characters

ValueCountFrequency (%)
A241398
13.4%
C229604
12.8%
N187751
10.5%
I158854
 
8.8%
O102604
 
5.7%
L96914
 
5.4%
D96078
 
5.4%
T94941
 
5.3%
M85034
 
4.7%
S73385
 
4.1%
Other values (14)428633
23.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter1795196
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A241398
13.4%
C229604
12.8%
N187751
10.5%
I158854
 
8.8%
O102604
 
5.7%
L96914
 
5.4%
D96078
 
5.4%
T94941
 
5.3%
M85034
 
4.7%
S73385
 
4.1%
Other values (14)428633
23.9%

Most occurring scripts

ValueCountFrequency (%)
Latin1795196
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A241398
13.4%
C229604
12.8%
N187751
10.5%
I158854
 
8.8%
O102604
 
5.7%
L96914
 
5.4%
D96078
 
5.4%
T94941
 
5.3%
M85034
 
4.7%
S73385
 
4.1%
Other values (14)428633
23.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII1795196
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A241398
13.4%
C229604
12.8%
N187751
10.5%
I158854
 
8.8%
O102604
 
5.7%
L96914
 
5.4%
D96078
 
5.4%
T94941
 
5.3%
M85034
 
4.7%
S73385
 
4.1%
Other values (14)428633
23.9%

NAICS
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct1312
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean398660.9501
Minimum0
Maximum928120
Zeros201948
Zeros (%)22.5%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:22.778916image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1235210
median445310
Q3561730
95-th percentile811192
Maximum928120
Range928120
Interquartile range (IQR)326520

Descriptive statistics

Standard deviation263318.3128
Coefficient of variation (CV)0.6605069111
Kurtosis-1.047652612
Mean398660.9501
Median Absolute Deviation (MAD)176300
Skewness-0.2628783414
Sum3.584615746 × 1011
Variance6.933653383 × 1010
MonotonicityNot monotonic
2022-06-22T02:02:22.836527image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0201948
 
22.5%
72211027989
 
3.1%
72221119448
 
2.2%
81111114585
 
1.6%
62121014048
 
1.6%
62441010111
 
1.1%
8121129230
 
1.0%
5617308935
 
1.0%
6213108733
 
1.0%
8123207894
 
0.9%
Other values (1302)576243
64.1%
ValueCountFrequency (%)
0201948
22.5%
11111032
 
< 0.1%
1111203
 
< 0.1%
1111301
 
< 0.1%
11114094
 
< 0.1%
11115049
 
< 0.1%
1111602
 
< 0.1%
1111913
 
< 0.1%
1111997
 
< 0.1%
11121116
 
< 0.1%
ValueCountFrequency (%)
92812032
< 0.1%
9281104
 
< 0.1%
9271101
 
< 0.1%
92615010
 
< 0.1%
9261406
 
< 0.1%
9261303
 
< 0.1%
9261205
 
< 0.1%
9261106
 
< 0.1%
9251201
 
< 0.1%
9251103
 
< 0.1%

ApprovalDate
Categorical

HIGH CARDINALITY

Distinct9859
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
7-Jul-93
 
1131
30-Jan-04
 
1032
8-Jul-93
 
780
4-Oct-04
 
658
30-Sep-03
 
608
Other values (9854)
894955 

Length

Max length9
Median length9
Mean length8.721139859
Min length8

Characters and Unicode

Total characters7841735
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique952 ?
Unique (%)0.1%

Sample

1st row28-Feb-97
2nd row28-Feb-97
3rd row28-Feb-97
4th row28-Feb-97
5th row28-Feb-97

Common Values

ValueCountFrequency (%)
7-Jul-931131
 
0.1%
30-Jan-041032
 
0.1%
8-Jul-93780
 
0.1%
4-Oct-04658
 
0.1%
30-Sep-03608
 
0.1%
30-Jun-05572
 
0.1%
18-Apr-05534
 
0.1%
6-Jul-93523
 
0.1%
21-Jan-05498
 
0.1%
27-Sep-02497
 
0.1%
Other values (9849)892331
99.2%

Length

2022-06-22T02:02:22.892777image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
7-jul-931131
 
0.1%
30-jan-041032
 
0.1%
8-jul-93780
 
0.1%
4-oct-04658
 
0.1%
30-sep-03608
 
0.1%
30-jun-05572
 
0.1%
18-apr-05534
 
0.1%
6-jul-93523
 
0.1%
21-jan-05498
 
0.1%
27-sep-02497
 
0.1%
Other values (9849)892331
99.2%

Most occurring characters

ValueCountFrequency (%)
-1798328
22.9%
0687310
 
8.8%
1492781
 
6.3%
9470677
 
6.0%
2464364
 
5.9%
u233553
 
3.0%
3229057
 
2.9%
a227906
 
2.9%
J221861
 
2.8%
e219341
 
2.8%
Other values (23)2796557
35.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3345915
42.7%
Dash Punctuation1798328
22.9%
Lowercase Letter1798328
22.9%
Uppercase Letter899164
 
11.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u233553
13.0%
a227906
12.7%
e219341
12.2%
r163835
9.1%
p163275
9.1%
n145374
8.1%
c139688
7.8%
g78776
 
4.4%
y77194
 
4.3%
l76487
 
4.3%
Other values (4)272899
15.2%
Decimal Number
ValueCountFrequency (%)
0687310
20.5%
1492781
14.7%
9470677
14.1%
2464364
13.9%
3229057
 
6.8%
6208904
 
6.2%
5203699
 
6.1%
7199006
 
5.9%
4197260
 
5.9%
8192857
 
5.8%
Uppercase Letter
ValueCountFrequency (%)
J221861
24.7%
M160822
17.9%
A158983
17.7%
S83068
 
9.2%
D69931
 
7.8%
O69757
 
7.8%
N68400
 
7.6%
F66342
 
7.4%
Dash Punctuation
ValueCountFrequency (%)
-1798328
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5144243
65.6%
Latin2697492
34.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
u233553
 
8.7%
a227906
 
8.4%
J221861
 
8.2%
e219341
 
8.1%
r163835
 
6.1%
p163275
 
6.1%
M160822
 
6.0%
A158983
 
5.9%
n145374
 
5.4%
c139688
 
5.2%
Other values (12)862854
32.0%
Common
ValueCountFrequency (%)
-1798328
35.0%
0687310
 
13.4%
1492781
 
9.6%
9470677
 
9.1%
2464364
 
9.0%
3229057
 
4.5%
6208904
 
4.1%
5203699
 
4.0%
7199006
 
3.9%
4197260
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII7841735
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
-1798328
22.9%
0687310
 
8.8%
1492781
 
6.3%
9470677
 
6.0%
2464364
 
5.9%
u233553
 
3.0%
3229057
 
2.9%
a227906
 
2.9%
J221861
 
2.8%
e219341
 
2.8%
Other values (23)2796557
35.7%

ApprovalFY
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct52
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
2005
77525 
2006
76040 
2007
71876 
2004
68290 
2003
58193 
Other values (47)
547240 

Length

Max length5
Median length4
Mean length4.000020019
Min length4

Characters and Unicode

Total characters3596674
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st row1997
2nd row1997
3rd row1997
4th row1997
5th row1997

Common Values

ValueCountFrequency (%)
200577525
 
8.6%
200676040
 
8.5%
200771876
 
8.0%
200468290
 
7.6%
200358193
 
6.5%
199545758
 
5.1%
200244391
 
4.9%
199640112
 
4.5%
200839540
 
4.4%
199737748
 
4.2%
Other values (42)339691
37.8%

Length

2022-06-22T02:02:22.940461image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
200577525
 
8.6%
200676040
 
8.5%
200771876
 
8.0%
200468290
 
7.6%
200358193
 
6.5%
199545758
 
5.1%
200244391
 
4.9%
199640112
 
4.5%
200839540
 
4.4%
199737748
 
4.2%
Other values (42)339691
37.8%

Most occurring characters

ValueCountFrequency (%)
01167176
32.5%
9704676
19.6%
2639911
17.8%
1435726
 
12.1%
5125258
 
3.5%
6118366
 
3.3%
7112975
 
3.1%
8104656
 
2.9%
4102220
 
2.8%
385692
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3596656
> 99.9%
Uppercase Letter18
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01167176
32.5%
9704676
19.6%
2639911
17.8%
1435726
 
12.1%
5125258
 
3.5%
6118366
 
3.3%
7112975
 
3.1%
8104656
 
2.9%
4102220
 
2.8%
385692
 
2.4%
Uppercase Letter
ValueCountFrequency (%)
A18
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3596656
> 99.9%
Latin18
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
01167176
32.5%
9704676
19.6%
2639911
17.8%
1435726
 
12.1%
5125258
 
3.5%
6118366
 
3.3%
7112975
 
3.1%
8104656
 
2.9%
4102220
 
2.8%
385692
 
2.4%
Latin
ValueCountFrequency (%)
A18
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3596674
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01167176
32.5%
9704676
19.6%
2639911
17.8%
1435726
 
12.1%
5125258
 
3.5%
6118366
 
3.3%
7112975
 
3.1%
8104656
 
2.9%
4102220
 
2.8%
385692
 
2.4%

Term
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct412
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean110.7730781
Minimum0
Maximum569
Zeros810
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:22.990385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile16
Q160
median84
Q3120
95-th percentile300
Maximum569
Range569
Interquartile range (IQR)60

Descriptive statistics

Standard deviation78.85730507
Coefficient of variation (CV)0.7118815006
Kurtosis0.1857042421
Mean110.7730781
Median Absolute Deviation (MAD)33
Skewness1.120925802
Sum99603164
Variance6218.474562
MonotonicityNot monotonic
2022-06-22T02:02:23.045256image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
84230162
25.6%
6089945
 
10.0%
24085982
 
9.6%
12077654
 
8.6%
30044727
 
5.0%
18028164
 
3.1%
3619800
 
2.2%
1217095
 
1.9%
4815621
 
1.7%
729419
 
1.0%
Other values (402)280595
31.2%
ValueCountFrequency (%)
0810
 
0.1%
11608
0.2%
21809
0.2%
32112
0.2%
42173
0.2%
51866
0.2%
63054
0.3%
71761
0.2%
81693
0.2%
91875
0.2%
ValueCountFrequency (%)
5691
< 0.1%
5271
< 0.1%
5111
< 0.1%
5051
< 0.1%
4811
< 0.1%
4801
< 0.1%
4611
< 0.1%
4491
< 0.1%
4451
< 0.1%
4431
< 0.1%

NoEmp
Real number (ℝ≥0)

SKEWED

Distinct599
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.41135321
Minimum0
Maximum9999
Zeros6631
Zeros (%)0.7%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:23.107077image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q12
median4
Q310
95-th percentile40
Maximum9999
Range9999
Interquartile range (IQR)8

Descriptive statistics

Standard deviation74.10819634
Coefficient of variation (CV)6.494251379
Kurtosis7965.288643
Mean11.41135321
Median Absolute Deviation (MAD)3
Skewness80.24824355
Sum10260678
Variance5492.024764
MonotonicityNot monotonic
2022-06-22T02:02:23.164292image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1154254
17.2%
2138297
15.4%
390674
10.1%
473644
 
8.2%
560319
 
6.7%
645759
 
5.1%
1031536
 
3.5%
731495
 
3.5%
831361
 
3.5%
1220822
 
2.3%
Other values (589)221003
24.6%
ValueCountFrequency (%)
06631
 
0.7%
1154254
17.2%
2138297
15.4%
390674
10.1%
473644
8.2%
560319
 
6.7%
645759
 
5.1%
731495
 
3.5%
831361
 
3.5%
918131
 
2.0%
ValueCountFrequency (%)
99994
< 0.1%
99921
 
< 0.1%
99451
 
< 0.1%
90901
 
< 0.1%
90002
 
< 0.1%
85001
 
< 0.1%
80411
 
< 0.1%
80181
 
< 0.1%
80007
< 0.1%
79991
 
< 0.1%

NewExist
Categorical

Distinct3
Distinct (%)< 0.1%
Missing136
Missing (%)< 0.1%
Memory size6.9 MiB
1.0
644869 
2.0
253125 
0.0
 
1034

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2697084
Distinct characters4
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row2.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0644869
71.7%
2.0253125
 
28.2%
0.01034
 
0.1%
(Missing)136
 
< 0.1%

Length

2022-06-22T02:02:23.218869image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-22T02:02:23.278732image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1.0644869
71.7%
2.0253125
 
28.2%
0.01034
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0900062
33.4%
.899028
33.3%
1644869
23.9%
2253125
 
9.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1798056
66.7%
Other Punctuation899028
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0900062
50.1%
1644869
35.9%
2253125
 
14.1%
Other Punctuation
ValueCountFrequency (%)
.899028
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2697084
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0900062
33.4%
.899028
33.3%
1644869
23.9%
2253125
 
9.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII2697084
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0900062
33.4%
.899028
33.3%
1644869
23.9%
2253125
 
9.4%

CreateJob
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct246
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.430376439
Minimum0
Maximum8800
Zeros629248
Zeros (%)70.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:23.322161image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile10
Maximum8800
Range8800
Interquartile range (IQR)1

Descriptive statistics

Standard deviation236.6881652
Coefficient of variation (CV)28.07563422
Kurtosis1369.91097
Mean8.430376439
Median Absolute Deviation (MAD)0
Skewness36.99135473
Sum7580291
Variance56021.28756
MonotonicityNot monotonic
2022-06-22T02:02:23.377271image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0629248
70.0%
163174
 
7.0%
257831
 
6.4%
328806
 
3.2%
420511
 
2.3%
518691
 
2.1%
1011602
 
1.3%
611009
 
1.2%
87378
 
0.8%
76374
 
0.7%
Other values (236)44540
 
5.0%
ValueCountFrequency (%)
0629248
70.0%
163174
 
7.0%
257831
 
6.4%
328806
 
3.2%
420511
 
2.3%
518691
 
2.1%
611009
 
1.2%
76374
 
0.7%
87378
 
0.8%
93330
 
0.4%
ValueCountFrequency (%)
8800648
0.1%
56211
 
< 0.1%
51991
 
< 0.1%
50851
 
< 0.1%
35001
 
< 0.1%
31001
 
< 0.1%
30004
 
< 0.1%
25151
 
< 0.1%
21401
 
< 0.1%
20201
 
< 0.1%

RetainedJob
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
SKEWED
ZEROS

Distinct358
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.79725723
Minimum0
Maximum9500
Zeros440403
Zeros (%)49.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:23.435091image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile20
Maximum9500
Range9500
Interquartile range (IQR)4

Descriptive statistics

Standard deviation237.1205997
Coefficient of variation (CV)21.96118835
Kurtosis1362.018162
Mean10.79725723
Median Absolute Deviation (MAD)1
Skewness36.85481184
Sum9708505
Variance56226.1788
MonotonicityNot monotonic
2022-06-22T02:02:23.634766image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0440403
49.0%
188790
 
9.9%
276851
 
8.5%
349963
 
5.6%
439666
 
4.4%
532627
 
3.6%
623796
 
2.6%
716530
 
1.8%
815698
 
1.7%
1015438
 
1.7%
Other values (348)99402
 
11.1%
ValueCountFrequency (%)
0440403
49.0%
188790
 
9.9%
276851
 
8.5%
349963
 
5.6%
439666
 
4.4%
532627
 
3.6%
623796
 
2.6%
716530
 
1.8%
815698
 
1.7%
98735
 
1.0%
ValueCountFrequency (%)
95001
 
< 0.1%
8800648
0.1%
72501
 
< 0.1%
50001
 
< 0.1%
44411
 
< 0.1%
40002
 
< 0.1%
39001
 
< 0.1%
38601
 
< 0.1%
32251
 
< 0.1%
32001
 
< 0.1%

FranchiseCode
Real number (ℝ≥0)

ZEROS

Distinct2768
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2753.725933
Minimum0
Maximum99999
Zeros208835
Zeros (%)23.2%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:23.695049image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile15805
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation12758.01914
Coefficient of variation (CV)4.633002501
Kurtosis24.40952381
Mean2753.725933
Median Absolute Deviation (MAD)0
Skewness4.975215215
Sum2476051225
Variance162767052.3
MonotonicityNot monotonic
2022-06-22T02:02:23.754862image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1638554
71.0%
0208835
 
23.2%
787603373
 
0.4%
680201921
 
0.2%
505641034
 
0.1%
217801003
 
0.1%
25650715
 
0.1%
79140659
 
0.1%
22470615
 
0.1%
17998606
 
0.1%
Other values (2758)41849
 
4.7%
ValueCountFrequency (%)
0208835
 
23.2%
1638554
71.0%
312
 
< 0.1%
3955
 
< 0.1%
3993
 
< 0.1%
4002
 
< 0.1%
40112
 
< 0.1%
4041
 
< 0.1%
40734
 
< 0.1%
4142
 
< 0.1%
ValueCountFrequency (%)
999991
 
< 0.1%
920064
 
< 0.1%
920009
< 0.1%
9199911
< 0.1%
914502
 
< 0.1%
914461
 
< 0.1%
914432
 
< 0.1%
914351
 
< 0.1%
914241
 
< 0.1%
914232
 
< 0.1%

UrbanRural
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
1
470654 
0
323167 
2
105343 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters899164
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
1470654
52.3%
0323167
35.9%
2105343
 
11.7%

Length

2022-06-22T02:02:23.810576image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-22T02:02:23.863705image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
1470654
52.3%
0323167
35.9%
2105343
 
11.7%

Most occurring characters

ValueCountFrequency (%)
1470654
52.3%
0323167
35.9%
2105343
 
11.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number899164
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1470654
52.3%
0323167
35.9%
2105343
 
11.7%

Most occurring scripts

ValueCountFrequency (%)
Common899164
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1470654
52.3%
0323167
35.9%
2105343
 
11.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII899164
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1470654
52.3%
0323167
35.9%
2105343
 
11.7%

RevLineCr
Categorical

HIGH CORRELATION

Distinct18
Distinct (%)< 0.1%
Missing4528
Missing (%)0.5%
Memory size6.9 MiB
N
420288 
0
257602 
Y
201397 
T
 
15284
1
 
23
Other values (13)
 
42

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters894636
Distinct characters18
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowN
2nd rowN
3rd rowN
4th rowN
5th rowN

Common Values

ValueCountFrequency (%)
N420288
46.7%
0257602
28.6%
Y201397
22.4%
T15284
 
1.7%
123
 
< 0.1%
R14
 
< 0.1%
`11
 
< 0.1%
26
 
< 0.1%
C2
 
< 0.1%
51
 
< 0.1%
Other values (8)8
 
< 0.1%
(Missing)4528
 
0.5%

Length

2022-06-22T02:02:23.901131image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
n420288
47.0%
0257602
28.8%
y201397
22.5%
t15284
 
1.7%
123
 
< 0.1%
r14
 
< 0.1%
14
 
< 0.1%
26
 
< 0.1%
c2
 
< 0.1%
51
 
< 0.1%
Other values (5)5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N420288
47.0%
0257602
28.8%
Y201397
22.5%
T15284
 
1.7%
123
 
< 0.1%
R14
 
< 0.1%
`11
 
< 0.1%
26
 
< 0.1%
C2
 
< 0.1%
31
 
< 0.1%
Other values (8)8
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter636987
71.2%
Decimal Number257635
28.8%
Modifier Symbol11
 
< 0.1%
Other Punctuation2
 
< 0.1%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N420288
66.0%
Y201397
31.6%
T15284
 
2.4%
R14
 
< 0.1%
C2
 
< 0.1%
A1
 
< 0.1%
Q1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0257602
> 99.9%
123
 
< 0.1%
26
 
< 0.1%
31
 
< 0.1%
71
 
< 0.1%
51
 
< 0.1%
41
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
,1
50.0%
.1
50.0%
Modifier Symbol
ValueCountFrequency (%)
`11
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin636987
71.2%
Common257649
28.8%

Most frequent character per script

Common
ValueCountFrequency (%)
0257602
> 99.9%
123
 
< 0.1%
`11
 
< 0.1%
26
 
< 0.1%
31
 
< 0.1%
,1
 
< 0.1%
71
 
< 0.1%
51
 
< 0.1%
.1
 
< 0.1%
41
 
< 0.1%
Latin
ValueCountFrequency (%)
N420288
66.0%
Y201397
31.6%
T15284
 
2.4%
R14
 
< 0.1%
C2
 
< 0.1%
A1
 
< 0.1%
Q1
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII894636
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N420288
47.0%
0257602
28.8%
Y201397
22.5%
T15284
 
1.7%
123
 
< 0.1%
R14
 
< 0.1%
`11
 
< 0.1%
26
 
< 0.1%
C2
 
< 0.1%
31
 
< 0.1%
Other values (8)8
 
< 0.1%

LowDoc
Categorical

Distinct8
Distinct (%)< 0.1%
Missing2582
Missing (%)0.3%
Memory size6.9 MiB
N
782822 
Y
110335 
0
 
1491
C
 
758
S
 
603
Other values (3)
 
573

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters896582
Distinct characters8
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowY
2nd rowY
3rd rowN
4th rowY
5th rowN

Common Values

ValueCountFrequency (%)
N782822
87.1%
Y110335
 
12.3%
01491
 
0.2%
C758
 
0.1%
S603
 
0.1%
A497
 
0.1%
R75
 
< 0.1%
11
 
< 0.1%
(Missing)2582
 
0.3%

Length

2022-06-22T02:02:23.945452image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-22T02:02:24.002768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
n782822
87.3%
y110335
 
12.3%
01491
 
0.2%
c758
 
0.1%
s603
 
0.1%
a497
 
0.1%
r75
 
< 0.1%
11
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
N782822
87.3%
Y110335
 
12.3%
01491
 
0.2%
C758
 
0.1%
S603
 
0.1%
A497
 
0.1%
R75
 
< 0.1%
11
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter895090
99.8%
Decimal Number1492
 
0.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N782822
87.5%
Y110335
 
12.3%
C758
 
0.1%
S603
 
0.1%
A497
 
0.1%
R75
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
01491
99.9%
11
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin895090
99.8%
Common1492
 
0.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
N782822
87.5%
Y110335
 
12.3%
C758
 
0.1%
S603
 
0.1%
A497
 
0.1%
R75
 
< 0.1%
Common
ValueCountFrequency (%)
01491
99.9%
11
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII896582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N782822
87.3%
Y110335
 
12.3%
01491
 
0.2%
C758
 
0.1%
S603
 
0.1%
A497
 
0.1%
R75
 
< 0.1%
11
 
< 0.1%

ChgOffDate
Categorical

HIGH CARDINALITY
MISSING

Distinct6448
Distinct (%)4.0%
Missing736465
Missing (%)81.9%
Memory size6.9 MiB
13-Mar-10
 
734
20-Feb-10
 
614
30-Jan-10
 
519
6-Feb-10
 
461
6-Mar-10
 
422
Other values (6443)
159949 

Length

Max length9
Median length9
Mean length8.716304341
Min length8

Characters and Unicode

Total characters1418134
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique861 ?
Unique (%)0.5%

Sample

1st row24-Jun-91
2nd row18-Apr-02
3rd row4-Oct-89
4th row26-Jun-14
5th row4-Oct-05

Common Values

ValueCountFrequency (%)
13-Mar-10734
 
0.1%
20-Feb-10614
 
0.1%
30-Jan-10519
 
0.1%
6-Feb-10461
 
0.1%
6-Mar-10422
 
< 0.1%
10-Jun-10415
 
< 0.1%
20-Mar-10414
 
< 0.1%
13-Feb-10400
 
< 0.1%
7-Jun-10350
 
< 0.1%
3-Jun-10338
 
< 0.1%
Other values (6438)158032
 
17.6%
(Missing)736465
81.9%

Length

2022-06-22T02:02:24.046459image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
13-mar-10734
 
0.5%
20-feb-10614
 
0.4%
30-jan-10519
 
0.3%
6-feb-10461
 
0.3%
6-mar-10422
 
0.3%
10-jun-10415
 
0.3%
20-mar-10414
 
0.3%
13-feb-10400
 
0.2%
7-jun-10350
 
0.2%
3-jun-10338
 
0.2%
Other values (6438)158032
97.1%

Most occurring characters

ValueCountFrequency (%)
-325398
22.9%
1177588
 
12.5%
0126799
 
8.9%
283425
 
5.9%
u48822
 
3.4%
946885
 
3.3%
J44922
 
3.2%
a43197
 
3.0%
838336
 
2.7%
e37857
 
2.7%
Other values (23)444905
31.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number604639
42.6%
Dash Punctuation325398
22.9%
Lowercase Letter325398
22.9%
Uppercase Letter162699
 
11.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
u48822
15.0%
a43197
13.3%
e37857
11.6%
n30637
9.4%
r28866
8.9%
p28398
8.7%
c21231
6.5%
g16046
 
4.9%
y15627
 
4.8%
l14285
 
4.4%
Other values (4)40432
12.4%
Decimal Number
ValueCountFrequency (%)
1177588
29.4%
0126799
21.0%
283425
13.8%
946885
 
7.8%
838336
 
6.3%
337546
 
6.2%
628366
 
4.7%
723654
 
3.9%
422727
 
3.8%
519313
 
3.2%
Uppercase Letter
ValueCountFrequency (%)
J44922
27.6%
M31051
19.1%
A29488
18.1%
S14956
 
9.2%
F12352
 
7.6%
O10682
 
6.6%
D10549
 
6.5%
N8699
 
5.3%
Dash Punctuation
ValueCountFrequency (%)
-325398
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common930037
65.6%
Latin488097
34.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
u48822
 
10.0%
J44922
 
9.2%
a43197
 
8.9%
e37857
 
7.8%
M31051
 
6.4%
n30637
 
6.3%
A29488
 
6.0%
r28866
 
5.9%
p28398
 
5.8%
c21231
 
4.3%
Other values (12)143628
29.4%
Common
ValueCountFrequency (%)
-325398
35.0%
1177588
19.1%
0126799
 
13.6%
283425
 
9.0%
946885
 
5.0%
838336
 
4.1%
337546
 
4.0%
628366
 
3.0%
723654
 
2.5%
422727
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII1418134
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
-325398
22.9%
1177588
 
12.5%
0126799
 
8.9%
283425
 
5.9%
u48822
 
3.4%
946885
 
3.3%
J44922
 
3.2%
a43197
 
3.0%
838336
 
2.7%
e37857
 
2.7%
Other values (23)444905
31.4%
Distinct8472
Distinct (%)0.9%
Missing2368
Missing (%)0.3%
Memory size6.9 MiB
Minimum1972-02-01 00:00:00
Maximum2071-12-31 00:00:00
2022-06-22T02:02:24.100737image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:24.157783image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

DisbursementGross
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct118859
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean201154.0167
Minimum0
Maximum11446325
Zeros196
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:24.222679image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile10000
Q142000
median100000
Q3238000
95-th percentile761892.5
Maximum11446325
Range11446325
Interquartile range (IQR)196000

Descriptive statistics

Standard deviation287640.85
Coefficient of variation (CV)1.4299533
Kurtosis35.08859907
Mean201154.0167
Median Absolute Deviation (MAD)70000
Skewness3.940992083
Sum1.808704503 × 1011
Variance8.273725858 × 1010
MonotonicityNot monotonic
2022-06-22T02:02:24.283120image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5000043787
 
4.9%
10000036714
 
4.1%
2500027387
 
3.0%
15000023373
 
2.6%
1000021328
 
2.4%
3500014748
 
1.6%
500014193
 
1.6%
7500013528
 
1.5%
2000013462
 
1.5%
3000012696
 
1.4%
Other values (118849)677948
75.4%
ValueCountFrequency (%)
0196
< 0.1%
111
 
< 0.1%
23
 
< 0.1%
33
 
< 0.1%
43
 
< 0.1%
52
 
< 0.1%
64
 
< 0.1%
73
 
< 0.1%
81
 
< 0.1%
91
 
< 0.1%
ValueCountFrequency (%)
114463251
< 0.1%
110000001
< 0.1%
104650001
< 0.1%
92844491
< 0.1%
89950001
< 0.1%
86078581
< 0.1%
86025841
< 0.1%
78532751
< 0.1%
76992331
< 0.1%
75738811
< 0.1%

BalanceGross
Categorical

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.9 MiB
$0.00
899150 
$12,750.00
 
1
$827,875.00
 
1
$25,000.00
 
1
$37,100.00
 
1
Other values (10)
 
10

Length

Max length12
Median length6
Mean length6.000076738
Min length6

Characters and Unicode

Total characters5395053
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique14 ?
Unique (%)< 0.1%

Sample

1st row$0.00
2nd row$0.00
3rd row$0.00
4th row$0.00
5th row$0.00

Common Values

ValueCountFrequency (%)
$0.00 899150
> 99.9%
$12,750.00 1
 
< 0.1%
$827,875.00 1
 
< 0.1%
$25,000.00 1
 
< 0.1%
$37,100.00 1
 
< 0.1%
$43,127.00 1
 
< 0.1%
$84,617.00 1
 
< 0.1%
$1,760.00 1
 
< 0.1%
$115,820.00 1
 
< 0.1%
$996,262.00 1
 
< 0.1%
Other values (5)5
 
< 0.1%

Length

2022-06-22T02:02:24.343595image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
0.00899150
> 99.9%
12,750.001
 
< 0.1%
827,875.001
 
< 0.1%
25,000.001
 
< 0.1%
37,100.001
 
< 0.1%
43,127.001
 
< 0.1%
84,617.001
 
< 0.1%
1,760.001
 
< 0.1%
115,820.001
 
< 0.1%
996,262.001
 
< 0.1%
Other values (5)5
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
02697490
50.0%
$899164
 
16.7%
.899164
 
16.7%
899164
 
16.7%
,13
 
< 0.1%
111
 
< 0.1%
78
 
< 0.1%
27
 
< 0.1%
67
 
< 0.1%
97
 
< 0.1%
Other values (4)18
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2697548
50.0%
Other Punctuation899177
 
16.7%
Currency Symbol899164
 
16.7%
Space Separator899164
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
02697490
> 99.9%
111
 
< 0.1%
78
 
< 0.1%
27
 
< 0.1%
67
 
< 0.1%
97
 
< 0.1%
56
 
< 0.1%
85
 
< 0.1%
44
 
< 0.1%
33
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
.899164
> 99.9%
,13
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$899164
100.0%
Space Separator
ValueCountFrequency (%)
899164
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common5395053
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
02697490
50.0%
$899164
 
16.7%
.899164
 
16.7%
899164
 
16.7%
,13
 
< 0.1%
111
 
< 0.1%
78
 
< 0.1%
27
 
< 0.1%
67
 
< 0.1%
97
 
< 0.1%
Other values (4)18
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII5395053
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
02697490
50.0%
$899164
 
16.7%
.899164
 
16.7%
899164
 
16.7%
,13
 
< 0.1%
111
 
< 0.1%
78
 
< 0.1%
27
 
< 0.1%
67
 
< 0.1%
97
 
< 0.1%
Other values (4)18
 
< 0.1%

MIS_Status
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)< 0.1%
Missing1997
Missing (%)0.2%
Memory size6.9 MiB
P I F
739609 
CHGOFF
157558 

Length

Max length6
Median length5
Mean length5.175617249
Min length5

Characters and Unicode

Total characters4643393
Distinct characters8
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowP I F
2nd rowP I F
3rd rowP I F
4th rowP I F
5th rowP I F

Common Values

ValueCountFrequency (%)
P I F739609
82.3%
CHGOFF157558
 
17.5%
(Missing)1997
 
0.2%

Length

2022-06-22T02:02:24.393765image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-06-22T02:02:24.448900image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
p739609
31.1%
i739609
31.1%
f739609
31.1%
chgoff157558
 
6.6%

Most occurring characters

ValueCountFrequency (%)
1479218
31.9%
F1054725
22.7%
P739609
15.9%
I739609
15.9%
C157558
 
3.4%
H157558
 
3.4%
G157558
 
3.4%
O157558
 
3.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3164175
68.1%
Space Separator1479218
31.9%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
F1054725
33.3%
P739609
23.4%
I739609
23.4%
C157558
 
5.0%
H157558
 
5.0%
G157558
 
5.0%
O157558
 
5.0%
Space Separator
ValueCountFrequency (%)
1479218
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin3164175
68.1%
Common1479218
31.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
F1054725
33.3%
P739609
23.4%
I739609
23.4%
C157558
 
5.0%
H157558
 
5.0%
G157558
 
5.0%
O157558
 
5.0%
Common
ValueCountFrequency (%)
1479218
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4643393
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1479218
31.9%
F1054725
22.7%
P739609
15.9%
I739609
15.9%
C157558
 
3.4%
H157558
 
3.4%
G157558
 
3.4%
O157558
 
3.4%

ChgOffPrinGr
Real number (ℝ≥0)

ZEROS

Distinct83165
Distinct (%)9.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13503.29513
Minimum0
Maximum3512596
Zeros737152
Zeros (%)82.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:24.490691image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile64888.85
Maximum3512596
Range3512596
Interquartile range (IQR)0

Descriptive statistics

Standard deviation65152.29269
Coefficient of variation (CV)4.824918072
Kurtosis184.3191639
Mean13503.29513
Median Absolute Deviation (MAD)0
Skewness11.22096997
Sum1.214167686 × 1010
Variance4244821243
MonotonicityNot monotonic
2022-06-22T02:02:24.548824image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0737152
82.0%
500002110
 
0.2%
100001865
 
0.2%
250001371
 
0.2%
350001345
 
0.1%
1000001028
 
0.1%
20000594
 
0.1%
30000492
 
0.1%
15000467
 
0.1%
5000356
 
< 0.1%
Other values (83155)152384
 
16.9%
ValueCountFrequency (%)
0737152
82.0%
16
 
< 0.1%
33
 
< 0.1%
42
 
< 0.1%
55
 
< 0.1%
63
 
< 0.1%
81
 
< 0.1%
91
 
< 0.1%
101
 
< 0.1%
113
 
< 0.1%
ValueCountFrequency (%)
35125961
< 0.1%
22237661
< 0.1%
21574991
< 0.1%
19999991
< 0.1%
19613981
< 0.1%
19337151
< 0.1%
19321801
< 0.1%
19314391
< 0.1%
19261481
< 0.1%
19176761
< 0.1%

GrAppv
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct22128
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean192686.9764
Minimum200
Maximum5472000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:24.614421image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum200
5-th percentile10000
Q135000
median90000
Q3225000
95-th percentile750000
Maximum5472000
Range5471800
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation283263.3913
Coefficient of variation (CV)1.470070249
Kurtosis21.01888249
Mean192686.9764
Median Absolute Deviation (MAD)65000
Skewness3.520790055
Sum1.732571924 × 1011
Variance8.023814885 × 1010
MonotonicityNot monotonic
2022-06-22T02:02:24.666923image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5000069394
 
7.7%
2500051258
 
5.7%
10000050977
 
5.7%
1000038366
 
4.3%
15000027624
 
3.1%
2000023434
 
2.6%
3500023181
 
2.6%
3000021004
 
2.3%
500019146
 
2.1%
1500018472
 
2.1%
Other values (22118)556308
61.9%
ValueCountFrequency (%)
2002
 
< 0.1%
3001
 
< 0.1%
4002
 
< 0.1%
50033
 
< 0.1%
7004
 
< 0.1%
8004
 
< 0.1%
9501
 
< 0.1%
1000444
< 0.1%
120012
 
< 0.1%
13002
 
< 0.1%
ValueCountFrequency (%)
54720001
 
< 0.1%
500000040
< 0.1%
49917001
 
< 0.1%
49500001
 
< 0.1%
49085001
 
< 0.1%
49000002
 
< 0.1%
48720001
 
< 0.1%
48690001
 
< 0.1%
48300001
 
< 0.1%
48000001
 
< 0.1%

SBA_Appv
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct38326
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean149488.7882
Minimum100
Maximum5472000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.9 MiB
2022-06-22T02:02:24.726998image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum100
5-th percentile5000
Q121250
median61250
Q3175000
95-th percentile626250
Maximum5472000
Range5471900
Interquartile range (IQR)153750

Descriptive statistics

Standard deviation228414.5615
Coefficient of variation (CV)1.52797119
Kurtosis25.32551382
Mean149488.7882
Median Absolute Deviation (MAD)48750
Skewness3.675275286
Sum1.344149367 × 1011
Variance5.217321191 × 1010
MonotonicityNot monotonic
2022-06-22T02:02:24.780167image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2500049579
 
5.5%
1250040147
 
4.5%
500031135
 
3.5%
5000025047
 
2.8%
1000017009
 
1.9%
1750016141
 
1.8%
1500014490
 
1.6%
750012781
 
1.4%
12750011946
 
1.3%
8000010965
 
1.2%
Other values (38316)669924
74.5%
ValueCountFrequency (%)
1002
 
< 0.1%
1501
 
< 0.1%
2002
 
< 0.1%
25033
 
< 0.1%
3504
 
< 0.1%
4004
 
< 0.1%
4751
 
< 0.1%
500442
< 0.1%
60012
 
< 0.1%
6502
 
< 0.1%
ValueCountFrequency (%)
54720001
 
< 0.1%
50000001
 
< 0.1%
48690001
 
< 0.1%
45820001
 
< 0.1%
450000023
< 0.1%
44925301
 
< 0.1%
44100001
 
< 0.1%
43200001
 
< 0.1%
40500004
 
< 0.1%
400000013
< 0.1%

Interactions

2022-06-22T02:02:05.757865image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:17.644080image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:24.905367image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:32.698447image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:06.783799image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:14.316177image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:21.616998image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.940934image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:36.261754image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:43.550684image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.967309image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:58.315428image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:05.900083image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:17.825767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:25.052114image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:35.037586image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:06.930219image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:14.450364image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:21.752655image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:29.087957image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:36.393040image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:43.690118image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:51.106729image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:58.462191image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:06.197520image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:18.140671image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:25.356543image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:37.562969image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:07.220522image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:14.736907image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:22.036223image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:29.382368image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:36.684473image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:43.978641image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:51.398531image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:58.760943image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:11.831614image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:23.696672image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:31.203306image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:45.392372image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:12.876731image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:20.332936image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:27.656078image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:34.983385image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:42.240311image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:49.618683image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:56.979346image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:04.396083image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:11.970536image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:23.825762image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:31.354619image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:47.840297image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:13.178651image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:20.581699image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:27.908030image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:35.110316image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:42.372577image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:49.749610image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:57.110561image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:04.529020image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:12.228652image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:23.960108image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:31.502952image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:50.149105image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:13.321630image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:20.707680image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.028116image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:35.366482image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:42.626474image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.012434image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:57.373483image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:04.786797image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:12.360475image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:24.090842image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:31.644577image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:52.591491image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:13.461778image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:20.832551image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.151757image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:35.491535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:42.754184image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.147318image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:57.498207image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:04.922055image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:12.492784image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:24.219926image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:31.792156image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:55.027370image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:13.604643image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:20.960215image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.277876image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:35.617207image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:42.882535image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.279773image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:57.623287image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:05.057702image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:12.631293image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:24.354065image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:31.939557image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:57.315164image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:13.749881image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:21.090522image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.405297image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:35.747125image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:43.013850image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.415708image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:57.758606image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:05.199285image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:12.768740image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:24.487211image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:32.093932image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:59.741726image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:13.892193image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:21.220723image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.535495image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:35.878281image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:43.147646image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.554969image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:57.895551image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:05.341464image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:12.916462image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:24.626951image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:32.244756image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:02.154319image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:14.041635image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:21.357977image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.670995image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:36.008807image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:43.284715image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.697962image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:58.039637image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:05.478981image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:13.056664image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:24.760419image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:00:32.402496image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:04.458727image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:14.187767image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:21.490482image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:28.805015image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:36.133908image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:43.416833image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:50.838499image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:01:58.175541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-06-22T02:02:05.617830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-06-22T02:02:24.837619image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-06-22T02:02:24.930096image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-06-22T02:02:25.032468image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2022-06-22T02:02:25.135830image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2022-06-22T02:02:13.867516image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-06-22T02:02:16.186337image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-06-22T02:02:20.543541image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-06-22T02:02:21.513153image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

LoanNr_ChkDgtNameCityStateZipBankBankStateNAICSApprovalDateApprovalFYTermNoEmpNewExistCreateJobRetainedJobFranchiseCodeUrbanRuralRevLineCrLowDocChgOffDateDisbursementDateDisbursementGrossBalanceGrossMIS_StatusChgOffPrinGrGrAppvSBA_Appv
01000014003ABC HOBBYCRAFTEVANSVILLEIN47711FIFTH THIRD BANKOH45112028-Feb-9719978442.00010NYNaN1999-02-2860000.0$0.00P I F0.060000.048000.0
11000024006LANDMARK BAR & GRILLE (THE)NEW PARISIN465261ST SOURCE BANKIN72241028-Feb-9719976022.00010NYNaN1997-05-3140000.0$0.00P I F0.040000.032000.0
21000034009WHITLOCK DDS, TODD M.BLOOMINGTONIN47401GRANT COUNTY STATE BANKIN62121028-Feb-97199718071.00010NNNaN1997-12-31287000.0$0.00P I F0.0287000.0215250.0
31000044001BIG BUCKS PAWN & JEWELRY, LLCBROKEN ARROWOK740121ST NATL BK & TR CO OF BROKENOK028-Feb-9719976021.00010NYNaN1997-06-3035000.0$0.00P I F0.035000.028000.0
41000054004ANASTASIA CONFECTIONS, INC.ORLANDOFL32801FLORIDA BUS. DEVEL CORPFL028-Feb-971997240141.07710NNNaN1997-05-14229000.0$0.00P I F0.0229000.0229000.0
51000084002B&T SCREW MACHINE COMPANY, INCPLAINVILLECT6062TD BANK, NATIONAL ASSOCIATIONDE33272128-Feb-971997120191.00010NNNaN1997-06-30517000.0$0.00P I F0.0517000.0387750.0
61000093009MIDDLE ATLANTIC SPORTS CO INCUNIONNJ7083WELLS FARGO BANK NATL ASSOCSD02-Jun-80198045452.00000NN24-Jun-911980-07-22600000.0$0.00CHGOFF208959.0600000.0499998.0
71000094005WEAVER PRODUCTSSUMMERFIELDFL34491REGIONS BANKAL81111828-Feb-9719978412.00010NYNaN1998-06-3045000.0$0.00P I F0.045000.036000.0
81000104006TURTLE BEACH INNPORT SAINT JOEFL32456CENTENNIAL BANKFL72131028-Feb-97199729722.00010NNNaN1997-07-31305000.0$0.00P I F0.0305000.0228750.0
91000124001INTEXT BUILDING SYS LLCGLASTONBURYCT6073WEBSTER BANK NATL ASSOCCT028-Feb-9719978432.00010NYNaN1997-04-3070000.0$0.00P I F0.070000.056000.0

Last rows

LoanNr_ChkDgtNameCityStateZipBankBankStateNAICSApprovalDateApprovalFYTermNoEmpNewExistCreateJobRetainedJobFranchiseCodeUrbanRuralRevLineCrLowDocChgOffDateDisbursementDateDisbursementGrossBalanceGrossMIS_StatusChgOffPrinGrGrAppvSBA_Appv
8991549995423005LITWIN LIVERY SERVICES, INC.CAMPBELLOH44405JPMORGAN CHASE BANK NATL ASSOCIL027-Feb-9719976011.000100NNaN1997-09-3010000.0$0.00P I F0.010000.05000.0
8991559995453003FUTURE LEADERS CENTER, INC.SO. OZONE PARKNY11420FLUSHING BANKNY62441027-Feb-97199718021.000100NNaN1997-06-30123000.0$0.00P I F0.0128000.096000.0
8991569995473009FABRICATORS STEEL, INC.BALTIMOREMD21224BANK OF AMERICA NATL ASSOCMD33243127-Feb-97199760201.000100NNaN1997-06-3050000.0$0.00P I F0.050000.025000.0
8991579995493004PULLTARPS MFG.EL CAJONCA92020U.S. BANK NATIONAL ASSOCIATIONCA31491227-Feb-97199736401.00010NNNaN1997-03-31200000.0$0.00P I F0.0200000.0150000.0
8991589995563001SHADES WINDOW TINTING AUTO ALAIRVINGTX75062LOANS FROM OLD CLOSED LENDERSDC027-Feb-9719978452.00010NYNaN1997-06-3079000.0$0.00P I F0.079000.063200.0
8991599995573004FABRIC FARMSUPPER ARLINGTONOH43221JPMORGAN CHASE BANK NATL ASSOCIL45112027-Feb-9719976061.000100NNaN1997-09-3070000.0$0.00P I F0.070000.056000.0
8991609995603000FABRIC FARMSCOLUMBUSOH43221JPMORGAN CHASE BANK NATL ASSOCIL45113027-Feb-9719976061.00010YNNaN1997-10-3185000.0$0.00P I F0.085000.042500.0
8991619995613003RADCO MANUFACTURING CO.,INC.SANTA MARIACA93455RABOBANK, NATIONAL ASSOCIATIONCA33232127-Feb-971997108261.00010NNNaN1997-09-30300000.0$0.00P I F0.0300000.0225000.0
8991629995973006MARUTAMA HAWAII, INC.HONOLULUHI96830BANK OF HAWAIIHI027-Feb-9719976061.00010NY8-Mar-001997-03-3175000.0$0.00CHGOFF46383.075000.060000.0
8991639996003010PACIFIC TRADEWINDS FAN & LIGHTKAILUAHI96734CENTRAL PACIFIC BANKHI027-Feb-9719974812.00010NNNaN1997-05-3130000.0$0.00P I F0.030000.024000.0